Dense Triangular Solvers on Multicore Clusters using UPC
نویسندگان
چکیده
The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality. This paper describes the implementation of efficient parallel dense triangular solvers in the PGAS language Unified Parallel C (UPC). The solvers are built on top of sequential BLAS functions and exploit the particularities of the PGAS paradigm. Furthermore, the numerical routines developed implement an automatic process that adapts the algorithms to the characteristics of the system where they are executed. The triangular solvers have been experimentally evaluated in two different multicore clusters and compared to message-passing based counterparts, demonstrating good scalability and efficiency.
منابع مشابه
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Paralle...
متن کاملPerformance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and Ope...
متن کاملTrading Replication for Communication in Parallel Distributed-Memory Dense Solvers
We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. These solvers are highly parallel and they perform a factor of 0.4P1/6 less communication than existing algorithms, where P is number of processors. The new solvers reduce communication at the expense of using more tempora...
متن کاملOn the Performance of an Algebraic Multigrid Solver on Multicore Clusters
Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG’s performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کامل